专利摘要:
method and device for rendering an audio sound field representation for audio reproduction. the invention discloses the rendering of sound field signals, such as higher order ambisonics (hoa), for arbitrary speaker configurations, where the rendering results in highly improved localization properties and is energy-preserving. this is achieved by a new type of decoding matrix for sound field data, and a new way of obtaining the decoding matrix. in a method for rendering an audio sound field representation for arbitrary spatial speaker configurations, the decoding matrix (d) for rendering for a given target speaker array is obtained by the steps of getting a number (l) ) of speakers, their positions (i), the positions (ii) of a spherical modeling grid and a hoa order (n), generate (141) a mixed matrix (g) from the positions (ii) of the grid of modeling and the speaker positions (i), generate (142) a mode matrix (iii) from the positions (ii) of the spherical modeling grid and the hoa order, calculate (143) a first decoding matrix (iv) from the mixed matrix (g) and the mode matrix (iii), and smoothing and scaling (144, 145) the first decoding matrix (iv) with smoothing and scaling coefficients.
公开号:BR112015001128B1
申请号:R112015001128-4
申请日:2013-07-16
公开日:2021-09-08
发明作者:Johannes Boehm;Florian Keiler
申请人:Dolby International Ab;
IPC主号:
专利说明:

FIELD OF THE INVENTION
[001] The present invention relates to a method and a device for rendering an audio sound field representation, and in particular, an audio representation in the Ambisonics format, for audio reproduction. BACKGROUND OF THE INVENTION
[002] Precise location is a crucial goal for any spatial audio reproduction system. Such playback systems are highly applicable for conference systems, games or other virtual environments that benefit from 3D sound. 3D sound scenes can be synthesized or captured as a natural sound field. Sound field signals, such as Ambisonics, carry a representation of a desired sound field. The Ambisonics format is based on the spherical harmonic decomposition of the sound field. Although the basic Ambisonics format or the B format uses zero and one order spherical harmonics, the so-called Higher Order Ambisonics (HOA) also uses additional spherical harmonics of at least 2nd order. A decoding or rendering process is required to obtain the individual speaker signals from such signals in the Ambisonics format. The spatial arrangement of the speakers is called the speaker configuration in the present invention. However, although known rendering approaches are only appropriate for regular speaker configurations, arbitrary speaker configurations are much more common. If such rendering approaches are applied to arbitrary speaker configurations, the sound directivity is deteriorated. SUMMARY OF THE INVENTION
[003] The present invention describes a method to render/decode an audio soundfield representation for both regular and non-regular speaker spatial distributions, wherein the rendering/decoding offers highly improved localization properties and has the characteristic of preserve energy. In particular, the invention offers a new way to obtain the decoding matrix for sound field data, for example, in HOA format. Since the HOA format describes a sound field, which is not directly related to speaker positions, and since the speaker signals to be obtained are necessarily in a channel-based audio format, the decoding of the HOA signals are always closely related to the rendering of the audio signal. Therefore, the present invention concerns both the decoding and rendering of audio formats related to the sound field.
[004] An advantage of the present invention is to obtain energy-preserving decoding with excellent directional properties. The term “energy preservation” means that the energy within the HOA directive signal is preserved after decoding, causing, for example, a constant amplitude directional spatial sweep to be perceived with constant sound intensity. The term “satisfactory directional properties” refers to the directivity of loudspeakers characterized by a main directive lobe and small side lobes, in which the directivity is increased compared to conventional rendering/decoding.
[005] The invention discloses the rendering of soundfield signals, such as Higher Order Ambisonics (HOA), to arbitrary speaker configurations, where the rendering results in highly improved localization properties and has the characteristic of preserving energy. This is achieved by a new type of decoding matrix for sound field data, and a new way to obtain the decoding matrix. In one method for rendering an audio soundfield representation for arbitrary spatial configurations of speakers, the decoding matrix for rendering for a given array of target speakers is obtained by the steps of obtaining a number of speakers and their positions, the positions of a spherical modeling grid and an HOA order, generate a mixed matrix from the positions of the modeling grid and the positions of the speakers, generate a mode matrix from the positions of the spherical modeling grid and of order HOA, calculate a first decoding matrix from the mixed matrix and the mode matrix, and smooth and scale the first decoding matrix with smoothing and scaling coefficients to obtain an energy-preserving decoding matrix. In one embodiment, the invention relates to a method for decoding and/or rendering an audio sound field representation for audio reproduction according to claim 1. In another embodiment, the invention relates to a device for decoding and/or rendering an audio sound field representation for audio reproduction according to claim 9. In yet another embodiment, the invention relates to a computer-readable medium having stored executable instructions therein causing a computer to perform a method. to decode and/or render an audio sound field representation for audio reproduction according to claim 15.
[006] Generally, the invention uses the following approach. First, panning functions are derived that are dependent on a speaker configuration that is used for playback. Second, a decoding matrix (eg Ambisonics decoding matrix) is calculated from these panning functions (or a mixed matrix obtained from the panning functions) for all speakers of the loudspeaker configuration. - speakers. In a third step, the decoding matrix is generated and processed to preserve energy. Finally, the decoding matrix is filtered in order to smooth the main lobe of the speaker's panning and suppress the side lobes. The filtered decoding matrix is used to render the audio signal for the given speaker configuration. Side lobes are a side effect of rendering and provide audio signals in unwanted directions. Since the rendering is optimized for the given speaker configuration, the side lobes are distracting. One of the advantages of the present invention is that the side lobes are minimized, causing the directivity of the speaker signals to be improved. According to an embodiment of the invention, a method for rendering/decoding an audio sound field representation for audio reproduction comprises steps of buffering received HOA temporal samples b(t), wherein blocks of M samples and an index of time μ are formed, filter the coefficients B(μ) to obtain frequency filtered coefficients ^U'), render the frequency filtered coefficients for a spatial domain using a decoding matrix D, where a spatial signal W(μ) is obtained . In one embodiment, further steps comprise delaying the temporal samples w(t) individually for each of the L channels on the delay lines, where L digital signals are obtained, and converting from Digital to Analog (D/A) and amplifying the L digital signals, where L analog speaker signals are obtained. The decoding matrix D for the rendering step, that is, for rendering for a given speaker arrangement, is obtained by the steps of getting a number of target speakers and the speaker positions, determining the speaker positions. a spherical modeling grid and an HOA order, generate a mixed matrix from the positions of a spherical modeling grid and the speaker positions, generate a mode matrix from the spherical modeling grid and the HOA order, calculate a first decoding matrix from the mixed matrix G and the mode matrix ^, and smoothing and scaling the first decoding matrix with smoothing and scaling coefficients, in which the decoding matrix is obtained.
[007] According to another aspect, a device for decoding an audio sound field representation for audio reproduction comprises a rendering processing unit having a decoding matrix calculation unit for obtaining the decoding matrix D, the unit of decoding matrix calculation comprising means for obtaining a number L of target speakers and means for obtaining the positions^1 of the speakers, means for determining the positions of a spherical modeling grid and means for obtaining an order N HOA, and a first processing unit for generating a mixed matrix G from the positions of the spherical modeling grid and the positions of the speakers, a second processing unit for generating a mode matrix from the spherical modeling grid and order N HOA , a third processing unit to perform a compact singular value decomposition of the Φ-mode matrix product with the hermitian transposed mixed matrix in G according to S' 1/"—iPG'1 u where U, V are derived from unit matrices and S is a diagonal matrix with singular value elements, calculation means for calculating a first decoding matrix from the matrices U, V according to = VS tf" , where $ is either an identity matrix or a diagonal matrix derived from said diagonal matrix with singular value elements, and a smoothing and scaling unit to smooth and scale the first matrix of decoding with smoothing coefficients A, where the decoding matrix D is obtained.
[008] According to yet another aspect, a computer-readable medium has stored executable instructions which, when executed on a computer, cause the computer to perform a method to decode an audio sound field representation for audio reproduction as revealed above.
[009] Additional objects, aspects and advantages of the invention will become apparent from a consideration of the following description and the appended claims when taken in connection with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS
[010] Illustrative embodiments of the invention are described with reference to the attached drawings, in which:
[011] Fig. 1 is a flowchart of a method according to an embodiment of the invention;
[012] Fig. 2 is a flowchart of a method for construction of the mixed matrix G;
[013] Fig. 3 is a block diagram of a renderer;
[014] Fig. 4 is a flowchart of schematic steps of a decoding matrix generation process;
[015] Fig. 5 is a block diagram of a decoding matrix generation unit;
[016] Fig. 6 is an illustrative 16-speaker configuration, where the speakers are illustrated as connected nodes;
[017] Fig. 7 shows the configuration of 16 illustrative speakers in natural view, where the nodes are illustrated as speakers;
[018] Fig. 8 is an energy diagram illustrating the constant Ê/Endo ratio for perfect energy preservation characteristic for a decoding matrix obtained with the prior art [14], with N=3;
[019] Fig. 9 is a sound pressure diagram for a decoding matrix designed according to the prior art [14] with N=3, where the central speaker panoramic positioning beam has strong side lobes;
[020] Fig. 10 is an energy diagram illustrating the ratio Ê/E having fluctuations greater than 4 dB for a decoding matrix obtained with the prior technique [2], with N=3;
[021] Fig. 11 is a sound pressure diagram for a decoding matrix designed according to the prior art [2] with N=3, where the center speaker panoramic positioning beam has small side lobes;
[022] Fig. 12 is an energy diagram illustrating the ratio Ê/E having fluctuations less than 1 dB as obtained by a method or apparatus according to the invention, where spatial movements with constant amplitude are perceived with intensity of sound. equal;
[023] Fig. 13 is a sound pressure diagram for a decoding matrix designed with the method according to the invention, in which the center speaker has a panoramic positioning beam with small side lobes. DETAILED DESCRIPTION OF THE INVENTION
[024] In general, the invention relates to the rendering (i.e., decoding) of sound field formatted audio signals, such as High Order Ambisonics (HOA) audio signals for speakers, in which the speakers. speakers are in symmetrical or asymmetrical positions, regular or non-regular. Audio signals may be suitable to power more speakers than available, for example the number of HOS coefficients may be greater than the number of speakers. The invention provides energy-preserving decoding matrices for decoders with excellent directional properties, i.e., loudspeaker directivity lobes generally comprise a stronger directive main lobe and smaller side lobes than loudspeaker directivity lobes obtained with conventional decoding matrices. Energy preservation means that the energy within the HOA directive signal is preserved after decoding, causing, for example, a constant amplitude directional spatial sweep to be perceived with constant sound intensity.
[025] Fig. 1 shows a flowchart of a method according to an embodiment of the invention. In this embodiment, the method for rendering (i.e., decoding) an HOA audio soundfield representation for audio reproduction uses a decoding matrix that is generated as follows: first, a L number of target speakers, the positions of the speakers, a spherical modeling grid and an N order (eg HOA order) are determined 11. From positions -θ'. from the speakers and the spherical modeling grid a mixed matrix G is generated 12, and from the spherical modeling grid and the order HOA N, a mode matrix is generated 13. A first decoding matrix is calculated 14 from the mixed matrix G and the matrix of mode . The first decoding matrix is smoothed 15 with smoothing coefficients , where a smoothed decoding matrix &, and the smoothed decoding matrix & is scaled 16 with a scaling factor taken from the smoothed decoding matrix &, where the decoding matrix D is obtained. In one embodiment, smoothing 15 and scaling 16 are performed in a single step. In one embodiment, smoothing coefficients are obtained by one of two different methods, depending on the number of L speakers and the number of L channels. If the number of L speakers is below the number of channels of HOA O3D coefficient, a new method to obtain the smoothing coefficients is used.
[026] In one embodiment, a plurality of decoding matrices corresponding to a plurality of different speaker arrangements are generated and stored for later use. The different speaker arrangements can differ by at least one of the number of speakers, a position of one or more speakers, and an N-order of an input audio signal. rendering, a correlation decoding matrix is determined, retrieved from storage as per current needs, and used for decoding.
[027] In one embodiment, the decoding matrix D is obtained by performing a compact singular value decomposition of the matrix product in such a way with the Hermitian transposed mixed matrix GH according to USV" = W '', and calculating a first decoding matrix D, D from the matrices U, V according to OR, V are derived from Unit matrices, and S is a diagonal matrix with singular value elements of said compact singular value decomposition of the product of the -mode matrix with the Hermitian transposed mixed matrix GH The decoding matrices obtained according to the present embodiment are generally numerically more stable than the decoding matrices obtained with an alternative embodiment described below The Hermitian transpose of a matrix is the conjugate transpose complex of the matrix.
[028] In the alternative embodiment, the decoding matrix D is obtained by performing a compact singular value decomposition of the product of the transpose-mode Hermitian matrix LTr with the mixed matrix G according to / $Ve Gem1, in which a first decoding matrix is derived by VUV
[029] In one embodiment, a compact singular value decomposition is performed on the mode matrix and on the mixed matrix G according to U $ = , where a first decoding matrix is derived by D _U S Von where S is a decomposition matrix of truncated compact singular value that is derived from the singular value decomposition matrix S by replacing all singular values greater than or equal to a threshold thr with ones, and replacing elements that are less than the threshold thr with zeros. Threshold thr depends on the real values of the singular value decomposition matrix and can be, by way of example, of the order of 006 * S1 (the maximum element of S). In one embodiment, a compact singular value decomposition is performed on the matrix and in the mixed matrix G according to , where a first decoding matrix is derived by and the threshold thr are as described above for the previous embodiment. Thr threshold is usually derived from the greatest singular value.
[030] In one embodiment, two different methods to calculate the smoothing coefficients are used, depending on the HOA N order and the number of target speakers L: if there is lessI__ 2, , , - (N +1) > L, the coefficients smoothing and scaling correspond to a conventional set of max rE coefficients that are derived from the zeros of the N + 1 Legendre polynomials; otherwise, if there are enough target speakers, that is, if θs.; - (N +1.)<L, the coefficients of are constructed from the ' C elements of a Kaiser window with length=(2N+1) and width=2N according to ■ft cf[:7<+ ,, 7ÍV42,-7G._2, 7ÇV+2, :7Çv+:i, comm...,JC2X[- with a scaling factor cf. The used elements of the Kaiser window start with the (N+1)-th element, which is used only once, and continue with subsequent elements that are used repeatedly: the (N+2)-th element is used three times, etc.
[031] In one embodiment, the scaling factor is obtained from the smoothed decoding matrix. In particular, in one embodiment, it is obtained according to

[032] In the following, a complete rendering system is described. The main focus of the invention is the renderer initialization phase, where a D decoding matrix is generated as described above. Here, the main focus is a technology to derive one or more decoding matrices, eg to a code dictionary. To generate a decoding matrix, you know how many target speakers are available, and where they are located (ie, their positions).
[033] Fig. 2 shows a flowchart of a method for building the mixed matrix G, according to an embodiment of the invention. In this embodiment, an initial mixed matrix with all zeros is created 21, and for each virtual source s with an angular direction = [^SJ ΦsV and radius rs, the following steps are performed. First, three speakers are determined 22, the which surround the position , in which unit radii are assumed, and a matrix _ i1^ '<>'r-' J is constructed 23, with 1■ = II . The matrix R is converted into Cartesian coordinates, according to LE = spherical_to_cartesian(B). Then, a virtual source position is constructed 25 according to s=(sinθsM0P sm^sm^os^r and a gain g is cac^o 26 according to &= 5 with ® ~ .a.-.J . Q gain is normalized 27 according to 0/11*9 II::, and the corresponding elements of G are replaced by the normalized gains: &A. The following section gives a brief introduction to Higher-order Ambisonics (HqA) and defines the signals to be processed, ie , rendered for speakers.
[034] q Higher-order Ambisonics (HqA) is based on the description of a sound field within a compact area of interest, which is supposed to be free of sound sources. In this case, the spatiotemporal behavior of the sound pressure 'PIC at time t and at position *_ lA $> ^.lTwithin the area of interest (in spherical coordinates: slope θ, azimuth Φ) is physically determined entirely by the homogeneous wave equation. It can be shown that the Fourier transform of sound pressure in relation to time, ie,
where w indicates the angular frequency (and corresponds to J ) "''Í/Í), it can be expanded to the series of Spherical Harmonics (SHs) according to [13]:

[035] In Eq. (2), Csindicates the speed of sound and the angular wavenumber. Furthermore, it indicates the spherical Bessel functions of the first type and the order ne •"v' indicates the Spherical Harmonics (SH) of order n and degree m. The complete information about the sound field is actually contained within the sound field coefficients C; i).
[036] It should be noted that SHs are complexvalue functions in general. However, by an appropriate linear combination of them, it is possible to obtain real value functions and perform expansion with respect to these functions.
[037] Related to the description of the pressure sound field in eq. (2, a source field can be defined as:
with the source field or amplitude density [12] d t..., β) depending on the angular wavenumber and the angular direction _ Pl'. A source field can consist of discrete/continuous far-field/near-field sources [1]. The nsource field coefficients "u are related to Jam
for the distant field for the field(4) where is the spherical Hankel function of the second type and rs is the distance of origin from the origin. Signals in the HOA domain can be represented in the frequency domain or the time domain as the inverted Fourier transform of the source field or sound field coefficients. The following description will assume the use of a time-domain representation of the source field coefficients: coefficients:
of a finite number: The infinite series in eq.(3) is truncated at n = N. The truncation corresponds to a spatial bandwidth limitation. The number of coefficients (or HOA channels) is given by:
or by ^2D = ~ 1 for 2D descriptions only. Coefficients comprise the Audio information of a sample of time t for later playback through the speakers. They can be stored or transmitted, and as such are subject to data rate compression. A single time t sample of the coefficients can be represented by the vector b(t) with O3D elements:
and a block of time samples M by matrix B c 'M

[038] Two dimensional representations of the sound fields can be derived by an expansion with circular harmonics. This is a special case of the general description presented above using a fixed slope of ,, different weighting of the coefficients and a reduced set for O2D coefficients (m = ± n). So all of the following considerations also apply to 2D representations; the term “sphere” then needs to be replaced by the term “circle”.
[039] In one embodiment, the metadata is sent along with the coefficient data, allowing an unambiguous identification of the coefficient data. All the information needed to derive the time sample coefficient vector b(t) is provided, either through the transmitted metadata or because of a given context. Furthermore, it is noted that at least one of the order HOA N or O3D, and in one embodiment, additionally a special flag along with rs to indicate a near-field record are known in the decoder. The following describes a rendering of an HOA signal for the speakers. This section shows the basic principle of decoding and some mathematical properties. Basic decoding assumes, first, flat wave speaker signals, and second, that the distance from the speakers to the source can be neglected. A time sample of the HOA coefficients of the HOA b coefficients rendered for L speakers that are located in spherical directions
with l = 1, ..., L can be described by [10]: W=D b where EL*'x1 represents a time sample of the speaker signals L and the DC decoding matrix C'/r': n . A decoding matrix can be derived by
where V71 is the pseudo inverse of the V7 mode matrix. The ¥7 mode matrix is defined as
with
consisting of the Spherical Harmonics of the speaker directions _I Φi I , where H indicates the transposed conjugate complex (also known as Hermitian).
[040] Next, a pseudo inverse of a matrix by Singular Value Decomposition (SVD) is described. A universal way to derive a pseudo inverse is to first compute the compact SVD:V=USVH (12)IU and Í^O3DXK y € .are derived from the rotation matrices and _ **>'.» and I*n A is a diagonal matrix of singular values in descending order with K>0 and K< min(O3D, L). The pseudo inverse is determined by ^ = VSUI((13) where S — ímjf S| ). For ill-conditioned matrices with very small Sk values, the corresponding inverse values of 1 are replaced by zero. This is called Truncated Singular Value Decomposition. Generally, a detection threshold with respect to the greatest singular value S1 is selected to identify the corresponding inverse values to be replaced by zero.
[041] The preservation property is described below. The signal energy in the HOA domain is given by E = b” b and the corresponding energy in the spatial domain by Ê = wH w = bHDHD b.
[042] The ratio í > for an energy-preservation decoding matrix is (substantially) constant. This can only be achieved if O’,D = c7) with the identity matrix Ie the constant . 1*. This requires that D have a condition number of norma-2 cond(D) = 1. This, again, requires that the SVD(Singular Value Decomposition) of D produce identical singular values: WUS Vo with niagÇS!.;,,S !().
[043] Generally, energy-conserving renderer design is known in the art. An energy-preserving decoding matrix design for ® <p is proposed in [14] by D = V UHonde S of eq. (13) is forced to be _ , and thus can be discarded in eq. (16). The product ft” i) U Ve V UhIe reason Ê / E become one. A benefit of this design method is the energy preservation that ensures a homogeneous spatial sound impression where spatial movements do not have fluctuations in perceived volume. A disadvantage of this design is the loss in directivity accuracy and the strong side lobes of the speaker beam for irregular and asymmetrical speaker positions (see Figs. 8-9). The present invention is able to overcome this disadvantage.
[044] In addition, a renderer design for unevenly positioned speakers is known in the art. In [2], a decoder design method for L > O 3D and L < O3D is described, which allows rendering with high precision in reproduced directivity. A disadvantage of this design method is that derived renderers do not have energy-preserving properties (see Figs. 10-11).
[045] Spherical convolution can be used for spatial smoothing. This is a spatial filtering process, or a windowing in the coefficient domain (convolution). Its objective is to minimize the side lobes, the so-called panoramic positioning lobes.a new coefficients bmn■' is given by the weighted product of the original HOA coefficient and the zonal coefficient 'b. [5]:

[046] This is equivalent to a left convolution in S2 in the spatial domain [5]. Conveniently, this is used in [5] to smooth out the directive properties of the loudspeaker signals before rendering/decoding by weighting the HOA B coefficients by:B = diag(K)Bt with the vector
containing generally real-valued weighting coefficients and a constant factor df. The idea of smoothing is to attenuate the HOA coefficients with an index of increasing order n. A well-known example of smoothing the weighting coefficients fs are the so-called max rv, max rE and in-phase coefficients [4]. The first offers the standard amplitude beam &= U> V ■ > 1J*. a vector of O3D length only with ones), The second provides uniformly distributed angular energy and in phase has full sidelobe suppression.
[047] Next, additional details and embodiments of the revealed solution are described. First, a renderer architecture is described in terms of its initialization, initialization behavior, and processing.
[048] Every time the speaker configuration, ie the number of speakers or the position of any speaker relative to the listening position is changed, the renderer needs to perform an initialization process to determine a set of decoding matrices for any HOA N order that the supported HOA input signals have. In addition, the individual speaker delays dl for the delay lines and the speaker gains gl are determined from the distance between a speaker and a listening position. This process is described below. In one embodiment, the derived decoding matrices are stored within a code dictionary. Every time the HOA audio input characteristics change, a renderer control unit determines the currently valid characteristics and selects a correlation decoding matrix from the code dictionary. The code dictionary key can be the order HOA N, or, equivalently, O3D (see eq. (6)).
[049] The schematic data processing steps for rendering are explained with reference to Fig. 3, which shows a block diagram of renderer processing blocks. These are a first temporary storage 31, a Frequency Domain Filtering unit 32, a rendering processing unit 33, a second temporary storage 34, a delay unit 35 for L channels, and a digital-to-analog converter and an amplifier 36.
[050] HOA temporal samples with time index t and HOA coefficient channels b(t) are first stored in the first temporary storage 31 to form blocks of M samples with block index μ. The coefficients of B(μ) are frequency filtered in the Frequency Domain Filtering unit 32 to obtain frequency filtered blocks. This technology is known (see [3]) to compensate the distance of spherical speaker sources and allow manipulation of the near field records. The frequency filtered block signals are rendered to the spatial domain in rendering processing unit 33 by: W(í<) = DB(μ) (19)with E ..Í''''' representing a spatial signal at L channels with blocks of M temporal samples. The signal is temporarily stored in the second buffer 34 and serialized to form unique temporal samples with time index t in L channels, called w(t) in Fig. 3. This is a serial signal that is fed to L delay lines in the delay unit 35. The delay lines compensate for different listening position distances for individual speaker 1 with a delay of dl samples. In principle, each delay line is a FIFO (first in, first out memory). Then, the delay-compensated signals 355 are converted from digital to analog and amplified in the digital-to-analog converter and amplifier 36, which provides signals 365 that can be fed to L speakers. The tf-i speaker gain compensation can be considered before digital-to-analog conversion or by adapting the speaker channel amplification in the analog domain. Render initialization works as follows.
[051] First, the number and position of the speakers must be known. The first step of initialization is to make the new speaker number L and related positions available.
the distance from a listening position to a speaker l, and where &i>Φis the related spherical angles. Various methods can be applied, for example manual input of speaker positions or automatic initialization using a test tone. Manual entry of speaker positions can be done using a suitable interface such as a connected mobile device or a user interface integrated with the device for selecting preset position sets. Automatic start-up can be done using an array of dedicated microphones and speaker test signals with an evaluation unit to take off. The maximum distance rmax is determined by h>:í:.•- •:-• -, A), the minimum distance rmin by- HAS.(L.| , .... jj The L distances r and rmax are transmitted for the delay line and gain compensation 35. The number of delay samples for each speaker channel dl is determined by di = - ròf5/c+ 0.5J (20)with the sample rate fs, the speed of sound c (c — 343 m/s at a temperature of 20ocelsius) and [x + 0.5] indicating rounding to the next whole number. To compensate for the speaker gains for different r, the speaker gains »t=^~are determined by or are derived using an acoustic measurement.
[052] The calculation of decoding matrices, eg for the code dictionary, works as follows. The schematic steps of a method for generating the decoding matrix, in one embodiment, are illustrated in Fig. 4. Fig. 5 shows, in one embodiment, processing blocks of a corresponding device for generating the decoding matrix. The inputs are the speaker directions -^L, a spherical modeling grid ^5, and the HOA N order.
[053] Speaker directions - ^iJ can be, ~ G| = |&.0í| , and the spherical modeling grid Qs = i ^sl by the spherical angles iis = ΦAT. The number of directions is selected to be greater than the number of speakers (S > L) and greater than the number of HOA coefficients 0 ()s3.). Grid directions should sample the unit sphere fairly evenly. Suitable grids are discussed in [6], [9], and can be found in [7], [8]. The grid is selected once. As an example, an S = 234 grid of [6] is sufficient for decoding matrices up to HOA order N = 9. Other grids can be used for different HOA orders. The HOA order N is selected increments to fill the code dictionary of N = 1, ..., Nmax, with Nmax as the maximum HOA order of the supported HOA input content.
[054] The speaker directions and the spherical modeling grid are given to a Build Mixed Matrix 41 block, which generates a mixed matrix G of them. The spherical modeling grid and the HOA N order are reported to a Build ModeMatrix 42 block, which generates a mode matrix from it. Mixed array G and the mode matrix *s are passed to a Decoding Matrix Build block 43, which generates a decoding matrix of them. The decoding matrix is passed to a Smooth Decoding Matrix block 44, which smoothes and scales the decoding matrix. Additional details are presented below. The output of the Soft Decoding Matrix block 44 is the decoding matrix D, which is stored in the code dictionary with the related key N (or, alternatively, O3D). In the Build of Mode Matrix 42 block, the spherical modeling grid is used to construct a matrix analogous to eq. (11): -Tsl with ^J,h ... /r..y (n.s.).1. Note that the mode matrix & is called 3 in [2].
[055] In the Build block of Mixed Matrix 41, a mixed matrix G is created with e -<^Xi. Note that the mixed matrix G is called W in [2]. An l-th row of the mixed matrix G consists of mixed gains O to mix S virtual sources from s directions to the speaker. In one embodiment, Vector Base Amplitude Panoramic Positioning (VBAP) [11] is used to derive these mixed gains, as also in [2]. The algorithm for deriving G is summarized below. 1 Create G with zero values (ie, initialize G) 2 for each s = 1...S 3 { 4 Find 3 loudspeakers G that surround position ii.iiir, assuming unit radii and the construction matrix «- |rL,^.rl:J with with 5 Calculates Lt = spherical_to_cartesian (R) in Cartesian coordinates. 6 Construct virtual source position s = (sin 05cos Φs>sin0s sin0í7cos θs)T . 7 Calculate &= s, with — ^íjí|11 À';।J 8 Normalize gains: & ~ &"2 9 Fill related elements Gl,s of G with elements of g: ^ii.5 = Glit5 = 9l^ Gl „S = dh 10 }
[056] In the Build block of Decoding Matrix 43, the compact singular value decomposition of the matrix product of the mode matrix and the composite mixed matrix is calculated. This is an important aspect of the present invention, which can be carried out in a number of ways. In one embodiment, the compact singular value decomposition S of the matrix product of the mode matrix and the transposed mixed matrix G+ is calculated according to: USVH = WGT
[057] In an alternative embodiment, the compact singular value decomposition S of the matrix product of the mode matrix of the pseudo-inverse mixed matrix G+ is calculated according to: US VH= where G* is the pseudo-inverse of the mixed matrix G .
[058] In one embodiment, a diagonal matrix where S — S,,... } is created, where the first diagonal element is the inverse diagonal element L^- “^, and the following diagonal elements & are defined in a value of a &= 1) if — L- , where a is a threshold value, or are defined as a value of zero l)) if Ví aVi. A suitable threshold value was found to be around 0.06. Small deviations, for example within a range of ±0.01 or a range of ±10% are acceptable. The decoding matrix is then calculated as follows: D = VS UH
[059] In Smooth Decoding Matrix block 44, the decoding matrix is smoothed. Instead of applying smoothing coefficients to the HOA coefficients before decoding as known in the prior art, they can be combined directly with the decoding matrix. This saves a processing step, or processing block, respectively. D = D dia.qífi) (21)
[060] In order to obtain satisfactory energy preservation properties also for decoders for HOA content with more coefficients than speakers (ie O3D >L), the applied smoothing coefficients -#■ are selected depending on the HOA order N (O3D = (N + 1)2): &
[061] For L > O3D, it corresponds to the coefficients max rE derived from the zeros of the Legendre polynomials of order N + 1, & as in [4]. For L<O3D, the coefficients are constructed from a Kaiser window as follows: X = KaiserWindow(len, width) (22) with len = 2N + 1, width= 2N, where is a vector with 2N + 1 elements of real value. Elements are created by Kaiser window formula

[062] Where I0() indicates the zero-order Modified Bessel function of the first type. The vector -/! is constructed from the elements of: ■ & „ [or air air air air air 1T « — ^vAr+2, ^N+2> -^N+2>J^N+3>J^N+3> —t^ZNi qr where every element ।" ■» gets 2n + 1 repetitions for the HOA order index n = 0..N, and cf is a constant scaling factor to maintain equal volume between different HOA order programs. that is, the used elements of the Kaiser window start with the (N+1)-th element, which is used only once, and continue with subsequent elements that are used repeatedly: the (N+2)-th element is used three times, etc. .
[063] In one embodiment, the smooth decoding matrix is scaled. In one embodiment, scaling is performed on the Smooth Decoding Matrix block 44, as shown in Fig. 4 a). In a different embodiment, the scaling is performed as a separate step in a 45 Scale Matrix block, as shown in Fig. 4 b).
[064] In one embodiment, the constant scaling factor is obtained from the decoding matrix. In particular, it can be obtained according to the so-called Frobenius norm of the decoding matrix
where d-'f is a matrix element in row 1 and column q of matrix O (after smoothing). The normalized matrix is &= cr
[065] Fig. 5 shows, according to an aspect of the invention, a device for decoding an audio sound field representation for audio reproduction. It comprises a rendering processing unit 33 having a decoding matrix calculating unit 140 for obtaining the decoding matrix D, the decoding matrix calculating unit 140 comprising 1x means for obtaining a number L of target speakers and means for S3 to obtain the positions of the speakers, means 1y for determining the positions of a spherical modeling grid and means 1z for obtaining an order N HOA, and a first processing unit 141 for generating a mixed matrix G from the positions of the spherical modeling grid and speaker positions, a second aa compact singular value decomposition of the matrix product to the hermitian transposed mixed matrix G according to n 5 VJ - 'Mí , where U, V are derived from unitary matrices and S is a diagonal matrix with singular value elements, calculation means 144 for calculating a first decoding matrix D from matrices U, V according to D = , and a uni smoothing and scaling 145 to smooth and scale the first decoding matrix D with smoothing coefficients '^, where the decoding matrix D is obtained. In one embodiment, the smoothing and scaling unit 145 as a smoothing unit 1451 for smoothing the first decoding matrix D, wherein a smoothed decoding matrix E is obtained, and a smoothing unit 1452 for scaling the smoothed decoding matrix E, where the decoding matrix D is obtained.
[066] Fig. 6 shows speaker positions in an illustrative 16-speaker configuration in a schematic diagram of nodes, where the speakers are illustrated as connected nodes. Foreground connections are illustrated as solid lines, background connections as dashed lines. Fig. 7 shows the same speaker configuration with 16 speakers in a reduced view.
[067] Next, we describe example results obtained with the speaker configuration in Figs. 5 and 6. The energy distribution of the sound signal, and in particular the Ê / E ratio, is illustrated in dB on the 2nd sphere (all test directions). As an example for a panoramic speaker placement beam, the center speaker beam (speaker 7 in Fig. 6) is illustrated. For example, a decoder matrix that is designated as in [14], with N=3, produces an Ê / E ratio as shown in Fig. 8. It provides almost perfect energy preservation characteristics, since the ratio Ê / And it's almost constant: the differences between dark areas (corresponding to lower volumes) and light areas (corresponding to higher volumes) are less than 0.01 dB. However, as shown in Fig. 9, the corresponding panning beam from the center speaker has strong side lobes. This disturbs spatial perception, especially for off-center listeners.
[068] On the other hand, a decoder matrix that is designated as in [2], with N=3, produces an Ê / E ratio as shown in Fig. 9. On the scale used in Fig. 10, the dark areas correspond to volumes down to -2dB and bright areas at higher volumes up to +2dB. Thus, the Ê / E ratio shows fluctuations greater than 4dB, which is disadvantageous since spatial deviations, for example, from the top-to-center speaker position with constant amplitude cannot be perceived with equal volume. However, as shown in Fig. 11, the corresponding panning beam of the center speaker has very small side lobes, which is beneficial for off-center listening positions.
[069] Fig. 12 shows the energy distribution of a sound signal that is obtained with a decoder matrix according to the present invention, by way of example, for N=3 for easy comparison. The scale (illustrated on the right side of Fig. 12) of the Ê / E ratio ranges from 3.15 - 3.45 dB. Thus, fluctuations in the ratio are less than 0.31 dB, and the energy distribution in the sound field is very uniform. Consequently, any spatial deviations with constant amplitude are perceived with equal volume. The center speaker's panning beam has very small side lobes, as shown in Fig. 13. This is beneficial for off-center listening positions where the side lobes can be audible and would therefore be distracting. Thus, the present invention offers combined advantages that can be obtained with the prior art in [14] and [2], without suffering their respective disadvantages.
[070] Note that whenever a loudspeaker is mentioned here, it is intended to designate a sound emission device such as a loudspeaker The flowchart and/or block diagrams in the figures illustrate the configuration, operation and functionality of possible implementations of the computer program systems, methods and products in accordance with various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams can represent a module, segment or piece of code, which comprises one or more executable instructions to implement the specified logic functions. It should also be noted that, in some alternative implementations, the functions observed in the block may occur outside the order observed in the figures. For example, two blocks presented in succession can actually be executed substantially simultaneously, or blocks can be executed in reverse order, or blocks can be executed in an alternate order, depending on the functionality involved. It will also be noted that each block of block diagrams and/or flowchart illustration, and combinations of blocks in block diagrams and/or flowchart illustration, can be implemented by systems based on special purpose hardware that perform the specified functions or acts , or combinations of special-purpose hardware and computer instructions. Although not explicitly described, the present embodiments may be employed in any combination or sub-combination.
[071] In addition, as will be appreciated by those skilled in the art, aspects of these principles may be incorporated as a computer-readable system, method, or medium. Therefore, aspects of these principles may take the form of an all-hardware embodiment, an all-software embodiment (including firmware, resident software, microcode, and so on), or an embodiment combining software and hardware aspects, all which can generally be called here “circuit”, “module” or “system”. Furthermore, aspects of the present principles may take the form of a computer-readable storage medium. Any combination of one or more computer readable storage media can be used. A computer-readable storage medium, as used herein, is considered a non-temporary storage medium, given the intrinsic ability to store information in it, as well as the intrinsic ability to provide retrieval of information from it.
[072] In addition, it will be appreciated by those skilled in the art that the block diagrams presented in this document represent conceptual views of system components and/or illustrative circuitry system embodying the principles of the invention. Similarly, it will be appreciated that any flowcharts, data flow diagrams, state transition diagrams, pseudocodes, among others, represent various processes that can be substantially represented on computer readable storage media and therefore executed by a computer or processor, whether such computer or processor is illustrated explicitly or not. Cited References [1] TD Abhayapala. Generalized framework for spherical microphone arrays: Spatial and frequency decomposition. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (accepted) Vol. X, pp. , April 2008, Las Vegas, USA. [2] Johann-Markus Batke, Florian Keiler, and Johannes Boehm. Method and device for decoding an audio soundfield representation for audio playback. International Patent Application WO2011/117399 (PD100011). [3] Jérôme Daniel, Rozenn Nicol, and Sebastien Moreau. Further investigations of high order ambisonics and wavefield synthesis for holophonic sound imaging. In AES Convention Paper 5788 Presented at the 114th Convention, March 2003. Paper 4795 presented at the 1 14th Convention. [4] Jerome Daniel. Representation of champs acoustiques, application a la transmission and a reproduction of complex sound scenes dans a multimedia context. PhD thesis, Universite Paris 6, 2001. [5] James R. Driscoll and Dennis M. Healy Jr. Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15:202-250, 1994. [6] Jorg Fliege. Integration nodes for the sphere. http://www.personal.soton.ac.uk/jf1w07/nodes/nodes.html, Online, accessed 2012-06-01. [7] Jorg Fliege and Ulrike Maier. A two-stage approach to computing cube formulae for the sphere. Technical Report, Fachbereich Mathematik, Universitat Dortmund, 1999. [8] R.H. Hardin and N.J.A. Sloane. Webpage: Spherical designs, spherical t-designs. http://www2.research.att.com/~njas/sph-designs/. [9] R.H. Hardin and N.J.A. Sloane. Mclaren's improved snub cube and other new spherical designs in three dimensions. Discrete and Computational Geometry, 15:429-441, 1996. [10] M.A. Poletti. Three-dimensional surround sound systems based on spherical harmonics. J. Audio Eng. Soc, 53(11):1004-1025, November 2005. [11] Ville Pulkki. Spatial Sound Generation and Perception by Amplitude Panning Techniques. PhD thesis, Helsinki University of Technology, 2001 . [12] Boaz Rafaely. Plane-wave decomposition of the sound field on a sphere by spherical convolution. J. Cost. Soc. Am., 4(116):2149-2157, October 2004. [13] Earl G. Williams. Fourier Acoustics, volume 93 of Applied Mathematical Sciences. Academic Press, 1999. [14] F. Zotter, H. Pomberger, and M. Noisternig. Energypreserving ambisonic decoding. Acta Acustica united with Acustica, 98(1):37-47, January/February 2012.
权利要求:
Claims (19)
[0001]
1. Method for rendering a representation of a Higher Order Ambisonics sound or sound field (HOA), characterized by comprising: - decoding coefficients of the HOA sound field representation; - determine a mixed matrix (G) from the L speakers and the positions of the spherical modeling grid relative to an HOA order (N); - determine a mode matrix (S') from the spherical modeling grid and the HOA order (N); - render the coefficients of the HOA sound field representation from a frequency domain to a spatial domain based on a smoothed decoding matrix (ã); where a compact singular value decomposition of a product of the mode matrix (£) with the Hermitian transposed mixed matrix (G) according to , s vl! = ,JG , where U, V are derived from Unit matrices and s is based on a diagonal matrix with singular value elements, and a first decoding matrix (â) is determined from the matrices U, V according to â = ( 5 £.f- where s is a truncated compact singular value decomposition matrix which is either an identity matrix or a modified diagonal matrix, the modified diagonal matrix being derived from said diagonal matrix with first and second singular value elements , wherein at least one singular-valued first element that is greater than or equal to a threshold equals ones, and at least one singular-valued second element that is less than the threshold equals zero; and wherein a matrix Smoothed decoding (ã) is determined according to the smoothing and scaling of the first decoding matrix (ã) with smoothing coefficients.
[0002]
2. Method according to claim 1, characterized in that the smoothing is according to a first smoothing method based on the determination of L > O3D, and the smoothing is according to a second smoothing method based on determination of L < O3D, where O3D = (N + 1)2, and where a smoothed decoding matrix (ã) is obtained based on the smoothing.
[0003]
3. Method according to claim 2, characterized in that the second smoothing method is based on the weighting coefficients & from the elements of a Kaiser window.
[0004]
4. Method, according to claim 3, characterized by the fact that the Kaiser window is obtained according to = •.-:.;; !), with ■ = = 2.'.'-l, ■■ ■-* = , where a vector with z- -1 real-valued elements from
[0005]
5. Method according to claim 1, characterized in that the first decoding matrix (â) is smoothed (44) to obtain a smoothed decoding matrix (£), and a constant scaling factor cf is obtained at from the Frobenius norm of the smoothed decoding matrix (ã).
[0006]
6. Method according to claim 1, characterized in that the first decoding matrix (£) is smoothed (44) to obtain a smoothed decoding matrix (£), and the smoothed decoding matrix (ã) is scaled according to a constant scaling factor cf.
[0007]
7. Method according to claim 2, characterized in that the first smoothing method is based on the weighting coefficients q which are based on zeros of Legendre polynomials of order - - i .
[0008]
8. Method according to claim 1, characterized in that it further comprises - temporarily storing and serializing (34) a spatial signal W which is obtained according to the rendering of the coefficients of the HOA sound field representation, in which temporal samples w(t) for L channels are obtained; and - delaying (35) the time samples w(t) individually for each of the L channels on the delay lines, wherein L digital signals (355) are obtained; and where the delay lines compensate for different speaker distances.
[0009]
9. Device for rendering a representation of a Higher Order Ambisonics sound or sound field, characterized by comprising: - a decoder configured to decode coefficients of the HOA sound field representation, the decoder including: - a renderer configured to render coefficients of the representation of HOA sound field from a frequency domain to a spatial domain according to a smoothed decoding matrix (ã); - a processing unit configured to determine a mixed matrix (G) from the L speakers and the positions of a spherical modeling grid relative to an HOA order (N) and to determine a mode matrix (S') to from the spherical modeling grid and the HOA (N) order; wherein the processing unit is further configured to determine a compact singular value decomposition of the product of the mode matrix (£) with a Hermitian transposed mixed matrix (G) according to , s ve = ve^, and where U, V are derived from Unit matrices and s is based on a diagonal matrix with singular value elements, and where the processing unit is further configured to determine a first decoding matrix (â) from the matrices U, V according to 5 = [7, ,:e where s is a truncated compact singular value decomposition matrix which is either an identity matrix or a modified diagonal matrix, the modified diagonal matrix which is determined according to the diagonal matrix with first and second singular-valued elements, wherein at least one first singular-valued element that is greater than or equal to a threshold equals one, and at least one second singular-valued element that is less than the threshold equals zero; and in which a smoothed decoding matrix (à) is determined according to the smoothing and scaling of the first decoding matrix (à) with smoothing coefficients.
[0010]
10. Device according to claim 9, characterized in that the decoder is configured to apply the smoothed decoding matrix (ã) to the HOA sound field representation to determine a decoded audio signal.
[0011]
11. Device according to claim 9, characterized in that it further comprises storage means to store the smoothed decoding matrix (ã).
[0012]
12. Device according to claim 9, characterized in that the smoothing is according to a first method based on a determination of L > O3D, and the smoothing is according to a second smoothing method based on a determination of L < O3D, where O3D = (N + 1)2, and where the smoothed decoding matrix (^) is obtained in smoothing.
[0013]
13. Device according to claim 12, characterized by the fact that the second smoothing method is based on weighting coefficients * which are based on elements of a Kaiser window.
[0014]
14. Device according to claim 9, characterized in that the processing unit is further configured to smooth the first decoding matrix (â) to obtain a smoothed decoding matrix (ã), and the processing unit is further configured to determine a constant scaling factor cf is obtained from the Frobenius norm of the smoothed decoding matrix (ã).
[0015]
15. A non-transient computer-readable medium having stored executable instructions for making a computer perform a method for rendering a representation of a Higher Order Ambisonics sound or sound field (HOA), the method characterized by comprising: - decoding coefficients of the representation sound field HOA; - determine a mixed matrix (G) from the L speakers and the positions of the spherical modeling grid relative to an HOA order (N); - determine a mode matrix (S') from the spherical modeling grid and the HOA order (N); - render the coefficients of the HOA sound field representation from a frequency domain to a spatial domain based on a smoothed decoding matrix (ã); where a compact singular value decomposition of a product of the mode matrix (£) with the Hermitian transposed mixed matrix (G) according to v , V1= S'G, where U, V are derived from Unit matrices and s is based in a diagonal matrix with singular value elements, and a first decoding matrix (â) is determined from the matrices U, V according to â = ( rs ,and where s is a truncated compact singular value decomposition matrix which is either an identity matrix or a modified diagonal matrix, the modified diagonal matrix being derived from said diagonal matrix with first and second singular value elements, wherein at least one first singular value element that is greater than or equal to a threshold is equal to ones, and at least one second singular-valued element that is less than the threshold is equal to zero; and wherein a smoothed decoding matrix (à) is determined according to the smoothing and scaling of the first matrix of decoding (â) with smoothing coefficients.
[0016]
16. Method according to claim 1, characterized in that the threshold depends on the values of the diagonal matrix with singular value elements.
[0017]
17. Method according to claim 16, characterized in that the threshold depends on a maximum element S1 of the diagonal matrix with singular value elements.
[0018]
18. Device according to claim 9, characterized in that the threshold depends on the values of the diagonal matrix with elements of singular value.
[0019]
19. Device according to claim 18, characterized by the fact that the threshold depends on a maximum element S1 of the diagonal matrix with elements of singular value.
类似技术:
公开号 | 公开日 | 专利标题
BR112015001128B1|2021-09-08|METHOD AND DEVICE FOR RENDING A REPRESENTATION OF A SOUND OR SOUND FIELD AND A COMPUTER-READABLE MEDIUM
同族专利:
公开号 | 公开日
EP3629605B1|2022-03-02|
CN107071685B|2020-02-14|
EP2873253A1|2015-05-20|
JP6696011B2|2020-05-20|
CN107071686B|2020-02-14|
US20180367934A1|2018-12-20|
KR20200019778A|2020-02-24|
US20200252737A1|2020-08-06|
AU2013292057A1|2015-03-05|
AU2019201900B2|2021-03-04|
KR102201034B1|2021-01-11|
CN106658342A|2017-05-10|
US20210258708A1|2021-08-19|
CN104584588B|2017-03-29|
US9961470B2|2018-05-01|
US10306393B2|2019-05-28|
AU2019201900A1|2019-04-11|
CN107071686A|2017-08-18|
US10939220B2|2021-03-02|
KR102079680B1|2020-02-20|
CN107071687B|2020-02-14|
HK1210562A1|2016-04-22|
CN106658342B|2020-02-14|
AU2013292057B2|2017-04-13|
AU2017203820B2|2018-12-20|
JP2018038055A|2018-03-08|
US10595145B2|2020-03-17|
JP6230602B2|2017-11-15|
WO2014012945A1|2014-01-23|
CN106658343B|2018-10-19|
JP6472499B2|2019-02-20|
US20150163615A1|2015-06-11|
KR20210005321A|2021-01-13|
AU2021203484A1|2021-06-24|
BR112015001128A2|2017-06-27|
CN106658343A|2017-05-10|
JP2015528248A|2015-09-24|
JP2019092181A|2019-06-13|
JP2020129811A|2020-08-27|
CN107071685A|2017-08-18|
CN104584588A|2015-04-29|
EP3629605A1|2020-04-01|
US20180206051A1|2018-07-19|
JP2021185704A|2021-12-09|
US20190349700A1|2019-11-14|
KR20150036056A|2015-04-07|
EP2873253B1|2019-11-13|
US9712938B2|2017-07-18|
CN107071687A|2017-08-18|
US20170289725A1|2017-10-05|
JP6934979B2|2021-09-15|
US10075799B2|2018-09-11|
AU2017203820A1|2017-06-22|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5889867A|1996-09-18|1999-03-30|Bauck; Jerald L.|Stereophonic Reformatter|
US6645261B2|2000-03-06|2003-11-11|Cargill, Inc.|Triacylglycerol-based alternative to paraffin wax|
US7949141B2|2003-11-12|2011-05-24|Dolby Laboratories Licensing Corporation|Processing audio signals with head related transfer function filters and a reverberator|
CN1677493A|2004-04-01|2005-10-05|北京宫羽数字技术有限责任公司|Intensified audio-frequency coding-decoding device and method|
US9113281B2|2009-10-07|2015-08-18|The University Of Sydney|Reconstruction of a recorded sound field|
TWI444989B|2010-01-22|2014-07-11|Dolby Lab Licensing Corp|Using multichannel decorrelation for improved multichannel upmixing|
PL2553947T3|2010-03-26|2014-08-29|Thomson Licensing|Method and device for decoding an audio soundfield representation for audio playback|
NZ587483A|2010-08-20|2012-12-21|Ind Res Ltd|Holophonic speaker system with filters that are pre-configured based on acoustic transfer functions|
US9271081B2|2010-08-27|2016-02-23|Sonicemotion Ag|Method and device for enhanced sound field reproduction of spatially encoded audio input signals|
EP2451196A1|2010-11-05|2012-05-09|Thomson Licensing|Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three|
EP2450880A1|2010-11-05|2012-05-09|Thomson Licensing|Data structure for Higher Order Ambisonics audio data|US9288603B2|2012-07-15|2016-03-15|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for backward-compatible audio coding|
US9473870B2|2012-07-16|2016-10-18|Qualcomm Incorporated|Loudspeaker position compensation with 3D-audio hierarchical coding|
US9479886B2|2012-07-20|2016-10-25|Qualcomm Incorporated|Scalable downmix design with feedback for object-based surround codec|
US9761229B2|2012-07-20|2017-09-12|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for audio object clustering|
US9736609B2|2013-02-07|2017-08-15|Qualcomm Incorporated|Determining renderers for spherical harmonic coefficients|
CA2949108C|2014-05-30|2019-02-26|Qualcomm Incorporated|Obtaining sparseness information for higher order ambisonic audio renderers|
EP3149972B1|2014-05-30|2018-08-15|Qualcomm Incorporated|Obtaining symmetry information for higher order ambisonic audio renderers|
US9883310B2|2013-02-08|2018-01-30|Qualcomm Incorporated|Obtaining symmetry information for higher order ambisonic audio renderers|
US10178489B2|2013-02-08|2019-01-08|Qualcomm Incorporated|Signaling audio rendering information in a bitstream|
US9609452B2|2013-02-08|2017-03-28|Qualcomm Incorporated|Obtaining sparseness information for higher order ambisonic audio renderers|
US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|
US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|
EP2866475A1|2013-10-23|2015-04-29|Thomson Licensing|Method for and apparatus for decoding an audio soundfield representation for audio playback using 2D setups|
EP2879408A1|2013-11-28|2015-06-03|Thomson Licensing|Method and apparatus for higher order ambisonics encoding and decoding using singular value decomposition|
US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|
US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|
US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|
US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|
US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|
US9536531B2|2014-08-01|2017-01-03|Qualcomm Incorporated|Editing of higher-order ambisonic audio data|
US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|
US10516782B2|2015-02-03|2019-12-24|Dolby Laboratories Licensing Corporation|Conference searching and playback of search results|
EP3314916B1|2015-06-25|2020-07-29|Dolby Laboratories Licensing Corporation|Audio panning transformation system and method|
EP3329486B1|2015-07-30|2020-07-29|Dolby International AB|Method and apparatus for generating from an hoa signal representation a mezzanine hoa signal representation|
US9961467B2|2015-10-08|2018-05-01|Qualcomm Incorporated|Conversion from channel-based audio to HOA|
US10249312B2|2015-10-08|2019-04-02|Qualcomm Incorporated|Quantization of spatial vectors|
US10070094B2|2015-10-14|2018-09-04|Qualcomm Incorporated|Screen related adaptation of higher order ambisoniccontent|
FR3052951B1|2016-06-20|2020-02-28|Arkamys|METHOD AND SYSTEM FOR OPTIMIZING THE LOW FREQUENCY AUDIO RENDERING OF AN AUDIO SIGNAL|
US10182303B1|2017-07-12|2019-01-15|Google Llc|Ambisonics sound field navigation using directional decomposition and path distance estimation|
US10015618B1|2017-08-01|2018-07-03|Google Llc|Incoherent idempotent ambisonics rendering|
CN107820166B|2017-11-01|2020-01-07|江汉大学|Dynamic rendering method of sound object|
US20200105282A1|2018-10-02|2020-04-02|Qualcomm Incorporated|Flexible rendering of audio data|
法律状态:
2017-12-05| B25A| Requested transfer of rights approved|Owner name: DOLBY INTERNATIONAL AB (NL) |
2018-12-04| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-06-02| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-07-06| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-09-08| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 16/07/2013, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
EP12305862|2012-07-16|
EP12305862.0|2012-07-16|
PCT/EP2013/065034|WO2014012945A1|2012-07-16|2013-07-16|Method and device for rendering an audio soundfield representation for audio playback|
[返回顶部]